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Abstract. We present a novel bio-inspired and dynamic coding scheme 
for static images. Our coder aims at reproducing the main steps of the 
visual stimulus processing in the mammalian retina taking into account 
its time behavior. The main novelty of this work is to show how to 
exploit the time behavior of the retina cells to ensure, in a simple way, 
scalability and bit allocation. To do so, our main source of inspiration will 
be the biologically plausible retina model called Virtual Retina. Following 
a similar structure, our model has two stages. The first stage is an image 
transform which is performed by the outer layers in the retina. Here it 
is modelled by filtering the image with a bank of difference of Gaussians 
with time-delays. The second stage is a time-dependent analog-to-digital 
conversion which is performed by the inner layers in the retina. Thanks 
to its conception, our coder enables scalability and bit allocation across 
time. Also, our decoded images do not show annoying artefacts such as 
ringing and block effects. As a whole, this article shows how to capture 
the main properties of a biological system, here the retina, in order to 
design a new efficient coder. 
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1 Introduction 

Intensive efforts have been made during the past two decades for the design of 
lossy image coders yielding several standards such as JPEG and JPEG2000 |l|3j . 
These compression algorithms, mostly, followed the same conception schema, 
though, improving considerably the performances in terms of cost and quality. 
Yet, it became clear now that little is still to be gained if no shift is made in the 
philosophy underlying the design of coders. 

In this paper, we propose a novel image codec based on visual system prop- 
erties: Our aim is to set a new framework for coder design. In this context, 
neurophysiologic studies "have demonstrated that our sensory systems are re- 
markably efficient at coding the sensory environment" [8j , and we are convinced 
that an interdisciplinary approach would improve coding algorithms. 



We focused on the complex computations that the mammaUan retina op- 
erates to transform the incoming hght stimulus into a set of uniformly-shaped 
impulses, also called spikes. Indeed, recent studies such as ^ confirmed that the 
retina is doing non-trivial operations to the input signal before transmission, so 
that our goal here is to capture the main properties of the retina processing for 
the design of our new coder. 

Several efforts in the literature reproduced fragments of this retina processing 
through bio-inspired models and for various vision tasks, for example: object de- 
tection and robot movement decision [9] , fast categorization [19'20' , and regions 
of interest detection for bit allocation [13] . But most of these approaches do not 
account for the precise retina processing. Besides, these models overlooked the 
signal recovery problem which is crucial in the coding application. Attempts in 
this direction were done making heavy simplifications at the expense of biologi- 
cal relevance [M] or restricting the decoding ability within a set of signals in a 
dictionary [15]. Here, the originality of our work is twofold: we focus explicitly on 
the coding application and we keep our design as close as possible to biological 
reality considering most of the mammalian retina processing features. 

Our main source of inspiration will be the biologically plausible Virtual 
Retina model [23 whose goal was to find the best compromise between the bio- 
logical reality and the possibility to make large-scale simulations. Based on this 
model, we propose a coding scheme following the architecture and functionalities 
of the retina, doing some adaptations due to the application. 

This paper is organized as follows. In Section [2] we revisit the retina model 
called Virtual Retina [23] . In Section |3j we show how this retina model can 
be used as the basis of a novel bio-inspired image coder. The coding pathway 
is presented in a classical way distinguishing two stages: the image transform 
and the analog-to-digital (A/D) converter. In Section |4] we present the decoding 
pathway. In Section [5] we show the main results that demonstrate the properties 
of our model. In Section [6] we summarize our main conclusions. 

2 Virtual Retina: a bio-inspired retina model 

The motivation of our work is to investigate the retina functional architecture 
and use it as a design basis to devise new codecs. So, it is essential to understand 
what are the main functional principles of the retina processing. The literature in 
computational neuroscience dealing with the retina proposes different models. 
These models are very numerous, ranking from detailed models of a specific 
physiological phenomenon, to large-scale models of the whole retina. 

In this article, we focused on the category of large-scale retina models as we 
are interested in a model that gathers the main features of mammalian retina. 
Within this category, we considered the retina model called Virtual Retina [23] . 
This model is one of the most complete ones in the literature, as it encompasses 
the major features of the actual mammalian retina. This model is mostly state- 
of-the-art and the authors confirmed its relevance by reproducing accurately real 
cell recordings for several experiments. 



VIRTUAL RETINA CODER DECODER 




Fig. 1. (a) Schematic view of the Virtual Retina model proposed by [23]. (b) and 
(c): Overview of our bio-inspired codec. Given an image, the static DoG-based multi- 
scale transform generates the sub-bands {Fk}. DoG filters are sorted from the lowest 
frequency-band filter DoGq to the highest one DoGn-i- Each sub-band Fk is delayed 
using a time-delay circuit I^t^, with tk < tk-\-i- The time-delayed multi-scale output is 
then made available to the subsequent coder stages. The final output of the coder is 
a set of spike series, and the coding feature adopted will be the spike count nkij{tobs) 
recorded for each neuron indexed by (kij) at a given time tots- 



The architecture of the Virtual Retina model follows the structure of mam- 
malian retina as schematized in Figure [TJa). The model has several intercon- 
nected layers and three main processing steps can be distinguished: 

— Outer layers: The first processing step is described by non-separable spatio- 
temporal filters, behaving as time-dependent edge detectors. This is a clas- 
sical step implemented in several retina models. 

— Inner layers: A non-linear contrast gain control is performed. This step mod- 
els mainly bipolar cells by control circuits with time- varying conductances. 

— Ganglionic layer: Leaky integrate and fire neurons are implemented to model 
the ganglionic layer processing that finally converts the stimulus into spikes. 

Given this model as a basis, our goal is to adapt it to conceive the new codec 
presented in the next sections. 



3 The coding pathway 



The coding pathway is schematized in Figure [TJb). It fohows the same architec- 
ture as Virtual Retina. However, since we have to define also a decoding pathway, 
we need to think about the invert ibility of each processing stage. For this reason 
some adaptations are required and described in this section. 

3.1 The image transform: The outer retina layers 

In Virtual Retina^ the outer layers were modelled by a non-separable spatio- 
temporal filtering. This processing produces responses corresponding to spatial 
or temporal variations of the signal because it models time-dependent interac- 
tions between two low-pass filters: this is termed center-surround differences. 
This stage has the property that it responds first to low spatial frequencies and 
later to higher frequencies. This time- dependent frequency integration was shown 
for Virtual Retina (see [24 ) and it was confirmed experimentally (see, e.g., pT]). 
This property is interesting as a large amount of the total signal energy is con- 
tained in the lower frequency sub-bands, whereas high frequencies bring further 
details. This idea already motivated bit allocation algorithms to concentrate the 
resources for a good recovery on lower frequencies. 

However, it appears that inverting this non-separable spatio-temporal filter- 
ing is a complex problem [24 25 . To overcome this difficulty, we propose to model 
differently this stage while keeping its essential features. To do so, we decom- 
posed this process into two steps: The first one considers only center-surround 
differences in the spatial domain (through differences of Gaussians) which is jus- 
tified by the fact that our coder here gets static images as input. The second 
step reproduces the time-dependent frequency integration by the introduction 
of time- delays. 

Center-surround differences in the spatial domain: DoG Neurophysi- 
ologic experiments have shown that, as for classical image coders, the retina 
encodes the stimulus representation in a transform domain. The retinal stimu- 
lus transform is performed in the cells of the outer layers, mainly in the outer 
plexiform layer (OPL). Quantitative studies such as |6|16j have proven that the 
OPL cells processing can be approximated by a linear filtering. In particular, 
the authors in [6 proposed the largely adopted DoG filter which is a weighted 
difference of spatial Gaussians that is defined as follows: 

DoG{x,y) = WcGa,{x,y) - WsGa,{x,y), (1) 

where Wc and Ws are the respective weights of the center and surround compo- 
nents of the receptive fields, and <Jc and <Js are the standard deviations of the 
Gaussian kernels and Ga^ - 

In terms of implementation, as in [20], the DoG cells can be arranged in a 
dyadic grid to sweep all the stimulus spectrum as schematized in Figure |2|a). 
Each layer k in the grid, is tiled with DoGk cells having a scale Sk and generating 



a transform sub-band F/e, where (Jsk+i = ^(^Sk • So, in order to measure the degree 
of activation /^^j of a given DoGk ceh at the location (i, j) with a scale s/c, we 
compute the convolution of the original image / by the DoGk filter: 

oo 

E DoGk{i-x,j-y)f{x,y). (2) 

x,y= — oo 

This generates a set of | A^^ — 1 coefficients for an A^^-sized image, as it works 
in the same fashion as a Laplacian pyramid [2] . An example of such a bio-inspired 
multi-scale decomposition is shown in Figure |2|b). Note here that we added to 
this bank of filters a Gaussian low-pass scaling function that represents the state 
of the OPL filters at the time origin. This yields a low-pass coefficient /qqq and 
enables the recovery of a low-pass residue at the reconstruction level ^5pl2j . 




Fig. 2. (a) Input Lena Image, (b) Example of a dyadic grid of DoG's used for the 
image analysis (from [20]). (c) Example on image (a) of DoG coefficients generated by 
the retina model (the sub-bands are shown in the logarithmic scale) 



Integrating time dynamics through time-delay circuits Of course, the 
model described in ^ has no dynamical properties. In the actual retina, the 
surround G^^ in ([T]) appears progressively across time driving the filter passband 
from low frequencies to higher ones. Our goal is to reproduce this phenomenon 
that we called time-dependent frequency integration. To do so, we added in the 
coding pathway of each sub-band a time-delay circuit Dt^^. The value of is 
specific to Fk and is linearly increasing as a function of k. The t/e-delay causes 
the sub-band F^ to be transmitted to the subsequent stages of the coder starting 
from the time tk- The time-delayed activation coefficient I^iji^) computed at the 
location {i^j) for the scale Sk at time t is now defined as follows: 

I^-(f)=I^j Ht>.t.}{tl (3) 

where Ij^^^^j is the indicator function such that, IL|^^^^|(t) = if t < t/c and 1 
otherwise. 



3.2 The A/D converter: inner and ganglionic layers 

The retinal A/D converter is defined based on the processing occurring in the 
inner and ganghonic layers, namely a contrast gain control, a non-linear rectifi- 
cation and a discretization based on leaky integrate and fire (LIF) neurons p!Q] . 
A different treatment will be performed for each delayed sub-band, and this 
produces a natural bit allocation mechanism. Indeed, as each sub-band Fj. is 
presented at a different time t/c, it will be subject to a transform according to 
the state of our dynamic A/D converter at tk- 



Contrast gain control Retina adjust its operational range to match the in- 
put stimuli magnitude range. This is done by an operation called contrast gain 
control mainly performed in the bipolar cells. Indeed, real bipolar cells conduc- 
tance is time varying, resulting in a phenomenon termed shunting inhibition. 
This shunting avoids the system saturation by reducing high magnitudes. 
In Virtual Retina, given the scalar magnitude I^-^j of the input step current 

the contrast gain control is a non-linear operation on the potential of the 
bipolar cells. This potential varies according to both the time and the magnitude 
value /^fj; and wih be denoted by Vj^-j{t, I^fj). 

This phenomenon is modelled, for a constant value of I^fj, by the following 
differential equation: 

dV^ (t P^^) 

+ 9\t)V,%{t, i^) = I^it), for t ^ 0, 
[g\t) = E^.lQ{Vj:,^{tJ'^^)\ 



3(a) 



where Q{V]iij) =9l + \^ iytij^i)) and E^t = ^^expT^ , for t ^ 0. Figure 
shows the time behavior of Vkiji^^^kij) ^'^^ different magnitude values /^^j of 

'■kij 



ikfAt). 



Non-linear rectification Then, the potential Vkijit^ I^ij) subject to a non- 
linear rectification yielding the so-called ganglionic current I^iji^^ Kij)- Virtual 
Retina models it, for a constant scalar value I^fj, by: 

ni,it,i:!j) = N(T^.,,4t)*V,%it,r,^j)), fort^O, (5) 

where and are constant scalar parameters, T^g^ra is the linear transient 
filter defined by Ty^g^ra = So{t) — w^Erg{t), and N is defined by: 

-, if V < Vq 



N{v) = <( zg-A^(^-^g)' 

i^ + X^{v - v^), if V ^ v^, 
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where Zq, Vq^ and are constant scalar parameters. Figure [3[b) shows the time 
behavior of I kij Kij) different values of I^-^j. 

As the currents I^fj are delayed with times {t/e}, our goal is to catch the 
instantaneous behavior of the inner layers at these times {t^}. This amounts to 
infer the transforms that maps a given scalar magnitude I^^- into a 

rectified current I]^^- as the modelled inner layers would generate it at t^- To do 

so, we start from the time- varying curves of ^kiji^^ Kij) Figure 3(b) and we 
do a transversal cut at each time t^: We show in Figure [3(c) the resulting maps 
/f, such that 4^,^.(i„/°^') = /|',(J°^'). 

As for Kijit) (see (|3|), we introduce the time dimension using the indicator 
function IL|^>^^}(t). The final output of this stage is the set of step functions 
Il^^it) defined by: 

Il,j{t) = 11,^ t{t^t,}{t), with 11^^ = flillf^). (6) 



This non-linear rectification is analogous to a widely-used telecommunication 
technique: the companding [4]. Companders are used to make the quantization 
steps unequal after a linear gain control stage. Though, unlike A — law or /i — 
law companders that amplify low magnitudes, the inner layers emphasize high 
magnitudes in the signal. Besides, the inner layers stage have a time dependent 
behavior, whereas a usual gain controller /compander is static, and this makes 
our A/D converter go beyond the standards. 



Leaky integrate-and-fire quantization: The ganglionic layer is the deepest 
one tiling the retina: it transforms a continuous signal into discrete sets 

of spike trains. As in Virtual Retina^ this stage is modelled by leaky integrate 
and fire neurons (LIF) which is a classical model. One LIF neuron is associated 
to every position in each sub-band F^. The time-behavior of a LIF neuron is 
governed by the fluctuation of its voltage Vkij{t). Whenever Vkij{t) reaches a 
predefined 5 threshold, a spike is emitted and the voltage goes back to a resting 
potential V^. Between two spike emission times, t^^]- and t^l^^\ the potential 
evolves according to the following differential equation: 

'''^^^+9'V,.^^^)=Il.,{t)^ ^te\t^^,i^], (7) 

where is a constant conductance, and is a constant capacitance. In the liter- 
ature, neurons activity is commonly characterized by the count of spikes emitted 
during an observation time bin [0, tohs]^ which we denote by nkij{tobs) [22^- Obvi- 
ously, as rikijitobs) encodes for the value of I^-j{t)^ there is a loss of information 
as nkij{tobs) is an integer. The LIF is thus performing a quantization. If we ob- 
serve the instantaneous behavior of the ganglionic layer at different times ^055, 
we get a quasi-uniform scalar quantizer that refines in time. We can do this by 
a similar process to the one described in the previous paragraph. We show in 



Figure 3(d) the resulting maps /^^^ such that nkij{tobs) = fZbs^^kij)- 

Based on the set {nkij{tobs)}^ measured at the output of our coder, we de- 
scribe in the next section the decoding pathway to recover the initial image 



4 The decoding pathway 

The decoding pathway is schematized in Figure [TJc). It consists in inverting, 
step by step, each coding stage described in Section |3] At a given time tobs^ the 
coding data is the set of {^N'^ — 1) spike counts nkij{tobs)i this section describes 
how we can recover an estimation ft^^^ of the A^^-sized input image f{x,y). 
Naturally, the recovered image ft^^^^ (x, y) depends on the time tobs which ensures 
time-scalability: the quality of the reconstruction improves as tobs increases. The 
ganglionic and inner layers are inverted using look-up tables constructed off-line 
and the image is finally recovered by a direct reverse transform of the outer 
layers processing. 



Recovering the input of the ganglionic layer: First, given a spike count 
^kijitobs), we recover Il^j{tobs), the estimation of Il^jitohs)- To do so, we com- 
pute off-line the look-up table Tit^^X^'l^ij) that maps the set of current magnitude 
values II-- into spike counts at a given observation time tohs (see Figure 3(d)). 
The reverse mapping is done by a simple interpolation in the reverse- look up 
table denoted LUT^^/J . Here we draw the reader's attention to the fact that, 
as the input of the ganglionic layer is delayed, each coefficient of the sub-band 
Ff. is decoded according to the reverse map LUT^J^f_^^. Obviously, the recov- 
ered coefficients do not match exactly the original ones due to the quantization 
performed in the LIF's. 



Recovering the input of the inner layers: Second, given a rectified current 
value il^j{tohs), we recover I^fj{tobs), the estimation of I^fj{tobs)' In the same way 
as for the preceding stage, we infer the reverse "inner layers mapping" through 
the pre-computed look up table LUTf^^^ . The current intensities I^fj{tobs)^ 
corresponding to the retinal transform coefficients, are passed to the subsequent 
retinal transform decoder. 

Recovering the input stimulus: Finally, given the set of |7V^ — 1 coefficients 
{Kij(^obs)}i we recover ft^^^{x,y), the estimation of the original image stimulus 
f{x^y). Though the dot product of every pair of DoG filters is approximately 
equal to 0, the set of filters considered is not strictly orthonormal. We proved 
in [11 that there exists a dual set of vectors enabling an exact reconstruction. 
Hence, the reconstruction estimate / of the original input / can be obtained as 
follows: 

ftobsi^^y) = Yl Kiji^obs) DoGk{i - xj - y), (8) 

{kij} 

where {kij} is the set of possible scales and locations in the considered dyadic 
grid and DoGk are the duals of the DoGk filters obtained as detailed in pT] . 
Equation ([8| defines a progressive reconstruction depending on tobs- This feature 
makes the coder be time-scalable. 



5 Results 

We show examples of image reconstruction using our bio-inspired coder at dif- 
ferent time^ Then, we study these results in terms of quality and bit-cost. 
Quality is assessed by classical image quality criteria (PSNR and mean SSIM [2T]). 
The cost is measured by the Shannon entropy H(tobs) upon the population of 
{^feij (^o6s)}- The entropy computed in bits per pixel (bpp)^ for an A/'^-sized image, 

^ In all experiments, the model parameters are set to biologically realistic values: 
= 8 ID-IDS', = 121D"^s, = 9 1D"^ = 1.51D"^°F, = 4 ID"^ V , 
= 151D-12A, = 81D-1, = 16 ID"^ s; = 12 ID"^ 5', S = 210'^ V, 



is defined by: HiUs) = ^ Ek=o ^^"H ({n«,,,(U«), (ij) e [0,2'= - if}), 
where K is the number of analyzing sub-bands. 

Figure [4] shows two examples of progressive reconstruction obtained with 
our new coder. The new concept of time scalability is an interesting feature as it 
introduces time dynamics in the design of the coder. This is a consequence of the 
mimicking of the actual retina. We also notice that, as expected, low frequencies 
are transmitted first to get a first approximation of the image, then details are 
added progressively to draw its contours. The bit-cost of the coded image is 
slightly high. This can be explained by the fact that Shannon entropy is not 
the most relevant metric in our case as no context is taken into consideration, 
especially the temporal context. Indeed, one can easily predict the number of 
spikes at a given time t knowing nkij{t — dt). Note also that no compression 
techniques, such that bit-plane coding, are yet employed. Our paper aims mainly 
at setting the basis of new bio-inspired coding designs. 

For the reasons cited above, the performance of our coding scheme in terms 
of bit-cost have still to be improved to be competitive with the well estab- 
lished JPEG and JPEG2000 standards. Thus we show no comparison in this 
paper. Though primary results are encouraging, noting that optimizing the bit- 
allocation mechanism and exploiting coding techniques as bit-plane coding [18] 
would improve considerably the bit-cost. Besides, the image as reconstructed 
with our bio-inspired coder shows no ringing and no block effect. Finally our 
codec enables scalability in an original fashion through the introduction of time 
dynamics within the coding mechanism. 

Note also that differentiation in the processing of sub-bands, introduced 
through time-delays in the retinal transform, enables implicit but still not op- 
timized bit-allocation. In particular the non-linearity in the inner layers stage 
amplifies singularities and contours, and these provide crucial information for 
the analysis of the image. The trade-off between the emphasize made on high 
frequencies and the time-delay in the starting of their coding process is still an 
issue to investigate. 

6 Conclusion 

We proposed a new bio-inspired codec for static images. The image coder is 
based on two stages. The first stage is the image transform as performed by 
the outer layers of the retina. In order to integrate time dynamics, we added to 
this transform time delays that are sub-band specific so that, each sub-band is 
processed differently. The second stage is a succession of two dynamic processing 
steps mimicking the deep retina layers behavior. The latter perform an A/D 
conversion and generate a spike-based, invertible, retinal code for the input image 
in an original fashion. 

Our coding scheme offers interesting features such as (i) time-scalability, as 
the choice of the observation time of our codec enables different reconstruction 
qualities, and (ii) bit-allocation, as each sub-band of the image transform is 
separately mapped according to the corresponding state of the inner layers. 




Fig. 4. Progressive image reconstruction of Lena and Cameraman using our new bio- 
inspired coder. The coded/decoded image is shown at: 20 ms, 30 ms, 40 ms, and 50 
ms. Rate/Quahty are computed for each image in terms of the triplet (bit-cost in hpp/ 
PSNR quahty in dB / mean SSIM quaUty). Upper Une: From left to right (0.07 bpp/ 
20.5 dB/ 0.59), (0.38 hpp/ 24.4 dB / 0.73), (1.0 hpp/ 29.1 dB / 0.86), and (2.1 hpp/ 36.3 
dB/ 0.95). Lower line: From left to right (0.005 hpp/ 15.6 dB / 0.47), (0.07 hpp/ 18.9 
dB/ 0.57), (0.4 hpp/ 23 dB/ 0.71), and (1.2 hpp/ 29.8 dB/ 0.88). 

Primary results are encouraging, noting that optimizing the bit-allocation and 
using coding techniques as bit-plane coding would improve considerably the cost. 

This work is at the crossroads of diverse hot topics in the fields of neuro- 
sciences, brain-machine interfaces, and signal processing and tries to lay the 
groundwork for future efforts, especially concerning the design of new biologi- 
cally inspired coders. 
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